1 Data Description

First, we load the EXECSAL2.txt into \(R\). Then, we change all variable names to more descriptive ones.

Variable Name Description
y1 salary Salary of executive
x1 experience Experience(in years)
x2 education Education (in years)
x3 gender Gender (1 if male 0 if female)
x4 emps_sump Number of employees supervised
x5 assets Corporate assets (in millions of USD)
x6 board_mb Board member (1 if yes, 0 if no)
x7 age Age (in years)
x8 profit Company profits (in millions of USD)
x9 int_res Has international responsibility (1 if yes, 0 if no)
x10 sales Company’s total sales (in millions of USD)

2 Conduct EDA

2.1 Looking at Raw Values

salary experience education gender emps_sup assets board_mb age profit int_res sales
11.4436 12 15 1 240 170 1 44 5 0 21
11.7753 25 14 1 510 160 1 53 9 0 28
11.3874 20 14 0 370 170 1 56 5 0 26
11.2172 3 19 1 170 170 1 26 9 0 24
11.6553 19 12 1 520 150 1 43 7 0 27
11.1619 14 13 0 420 160 1 53 9 0 27

2.2 Computing Summary Statistics

Skim summary statistics
 n obs: 100 
 n variables: 11 

── Variable type:factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n n_unique          top_counts ordered
 board_mb       0      100 100        2 0: 51, 1: 49, NA: 0   FALSE
   gender       0      100 100        2 1: 66, 0: 34, NA: 0   FALSE
  int_res       0      100 100        2 0: 82, 1: 18, NA: 0   FALSE

── Variable type:integer ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   variable missing complete   n   mean     sd  p0    p25   p50    p75
        age       0      100 100  42.84   9.07  23  37     42.5  49.25
     assets       0      100 100 175.1   15.41 150 160    180   190   
  education       0      100 100  16.02   2.3   12  14     16    18   
   emps_sup       0      100 100 340.1  167.18  60 187.5  360   492.5 
 experience       0      100 100  13.08   7.34   1   7.75  13    20   
     profit       0      100 100   7.7    1.55   5   6      8     9   
      sales       0      100 100  24.83   2.74  20  23     25    27   
 p100     hist
   64 ▃▃▇▇▆▆▃▂
  200 ▃▇▁▆▇▁▇▃
   20 ▇▃▅▅▆▆▆▁
  600 ▇▆▅▆▇▆▇▇
   26 ▇▃▆▇▃▃▇▅
   10 ▂▇▁▇▆▁▇▆
   30 ▃▃▃▇▂▃▂▃

── Variable type:numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n  mean   sd    p0   p25   p50   p75  p100
   salary       0      100 100 11.46 0.26 10.66 11.28 11.46 11.61 12.06
     hist
 ▁▁▃▇▇▇▃▂

2.2.1 Description of Summary Statistics

  • The minimum value of \(age = 23, assets = 150, education = 12, emps\_sup = 60, experience = 1, profit = 5, sales = 20\), salary = 12.06$.

  • The maximum value of \(age = 64, assets = 200, education = 20, emps\_sup = 600, experience = 26, profit = 10, sales = 30, salary = 10.66\).

  • The mean value of \(age = 42.84, assets = 175.1, education = 16.02, emps\_sup = 340.1, experience = 13.08, profit = 7.7, sales = 24.83, salary = 11.46\).

  • The standard deviation of \(age = 9.07, assets = 15.41, education = 2.3, emps\_sup = 167.18, experience = 7.34, profit = 1.55, sales = 2.74, salary = 0.26\). A higher standard deviation means the data has a larger range of values, therefore, emps_sup has the largest range and salary has the smallest range.

  • The middle 50% of age ranges from \(37-49.25\), assets from \(160-190\), education from \(14-18\), emps_sup from \(187.5-492.5\), experience from \(7.75-20\), profit from \(6-9\), sales from \(23-27\), and salary from \(11.28-11.61\).

From these histograms we can see that;

# A tibble: 8 x 2
  Variable   Distribution  
  <chr>      <chr>         
1 age        Normal        
2 assets     Random        
3 education  Mostly Uniform
4 emps_sup   Mostly Uniform
5 experience Random        
6 profit     Random        
7 sales      Skewed Right  
8 salary     Skewed Left   
  • This means the number of people with an age between 33.77 and 51.91 is larger than the number of people of ages outside this range.

  • This means the value of assets is random across the population.

  • This means the number of people with any number of years of education is evenly distributed.

  • This means the number of employees supervised is evenly distributed.

  • This means the number of years of experience is random across the population.

  • This means the value of profit is random across the population.

  • This means the company’s total sales is $25 million or below.

  • This means the executive salary is $11.46 million or above.

2.3 Creating Data Visualizations

Visual Descriptions:

  • The distribution for males (1) is higher than the distribution for females (0).

  • The distribution for board members (1) and non-board members (0) are approximately the same. The distribution for non-board members is slightly higher than board members.

  • The distribution for people that do not have international responsibility (0) is significantly higher than people who do have international responsibility (1).

  • The distribution for people with 20 years of education is significantly lower than the distribution for people with 12.5 years to less than 20 years of education.

  • The greatest distribution for age is between 30 years and 45 years.

  • The distribution of people 60 years of age and older are the lowest compared to the distribution of people between the ages of 30 and 45 years of age.

  • The mean salary for males (1) is higher than the mean salary for females (0).

  • There is a positive linear relationship between a person’s experience (in years) and their salary.

  • There is a positive linear relationship between a person’s age and their salary.

  • The mean salaries for people with international responsibility (1) and with no international responsibility (0) are approximately even.

2.4 Creating Linear Models

2.4.1 Experience, Education, Gender, and Assets: Parallel Slopes Model

Using a linear model with parallel slopes, we can predict an executive’s salary (in millions) based on their experience, education, gender, and assets.

experience, education, gender, and assets all have significant positive correlation to salary that will be included in our linear model.

\[\hat{Salary} = 10.14 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets + 0.185 \cdot 1_{Male}(x)\]

Male executive model:

\[\hat{Salary} = 10.325 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets\]

Female executive model:

\[\hat{Salary} = 10.14 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets\]

In our base model, we could extrapolate that executives have a salary of $10.14 million assuming they have no experience and no education, With every extra year of experience and education, one could expect their salary to increase by $27,000 and $22,000 million respectively. Male executives, on average, make $185,000 more than their female counterparts with similar experience, education, and assets.

2.4.2 Experience & Gender: Interaction Model

Using an interaction model, we can use both the experience and gender variables to see how they interact with each other in terms of salary.

\[\hat{score} = 11 + 0.026 \cdot experience + 0.174 \cdot 1_{Male}(x) + 0.002 \cdot experience \cdot 1_{Male}(x)\]

Female experience model:

\[\hat{score}_F = 11 + 0.026 \cdot experience\]

Male experience model:

\[\hat{score}_M = 11.174 + 0.028 \cdot experience\]

As we can see from the models, male executives have both higher base salaries than women in addition to marginally higher increase in salaries with an increase in experience. However, as evidenced from the graph, this interaction between experience and gender is negligible, as both genders encounter an increase in pay at the same rate.

2.4.3 Experience & Education: Another Representation

Intuitively, education and experience are the most important variables in predicting an executive’s salary. On the plot below, experience and education (both in years) are displayed on the floor axes. On the vertical axis, the salary is displayed. The salary is also color-coded, with higher salaries being represented by more ‘hot’ colors. From the graph alone, we can see that more education and more experience is crucial to having a higher salary in an executive position.

2.4.4 Predicting Profit

One might try to predict company profits based on attributes that make a good executive.

However, there is little to no correlation between any variable and profit. The closest thing is would be using assets, but as evidenced by the plot below, there is no visible accuracy in this model.